-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional RNNoise support to AudioBridge #3185
Conversation
I spent some time investigating what was wrong, and managed to fix the broken audio and artifacts in all combinations of sampling rate, mono and stereo alike. It seems to be working fine to me, now (even disabling/enabling denoising on the fly, where you can actually hear it at work), so I marked the PR as ready for review. That said, I'll obviously wait for more feedback before (if) we merge this, also taking into account the jitter buffer PR would need to be merged first (which means I'll have to rebase this accordingly). |
Hi! Is there a chance that this feature will be in VideoRoom Plugin? |
No, because the VideoRoom plugin only relays packets, it doesn't decode them as the AudioBridge does. |
Hi! Thanks for this PR! We were using a frontend version of the same RNNoise with the default model via WASM and it doesn't affect the original voice quality much. (https://github.com/jitsi/rnnoise-wasm) But, when testing this PR it results in a very "strangled" voice quality compared to the WASM one. Would you say that this is related to the fact that it's used on the original lossless microphone samples before encoding frontend side, or rather something related to the usage of RNNoise in this PR ? PS: I could provide 3 simultaneous audio recordings of the original, denoised-frontend, denoised-janus sound in a noisy environment if it helps Thanks again ! |
I think it depends on the sampling rate you're using in the AudioBridge. If you use, e.g., 16000, resampling will be performed, and I think it may be that the way we do it causes what you hear. If in a 48000 room the audio is good, and comparable in your tests, that's likely it. |
Indeed i retested with 24000/48000 room sampling rate and it works perfectly fine. I'll be moving this to production tonight for some 500 daily users (some OPUS/48000 some PCMA/8000 (SIP & PSTN bridged from the SIP plugin to the AudioBridge plugin)) I'll let you guys know how it works out for quality & CPU usage. Thanks again great work 🙏 |
We've seen no significant CPU usage increase with this enabled in production (2 to 4 percentage points increase in average CPU usage across multiple 32 CPU VMs). Although, it does seem to add some "barely noticeable" cumulative delay in audio contributions with denoise enabled (just under the 200ms mark) for hour long conferences. |
@Odinvt are you sure the observed delay is due to the denoiser? |
@atoppi Indeed I was on v1.1.4 stable before. I only switched to v1.2.0 because this merge was rebased on top of speexdsp's jitter buffer. So i never got to actually test v1.2.0 separately. I will pay attention to the mentioned values for both denoise enabled and disabled, and get back to you asap. Thanks ! |
Hi, we are testing your PR. Denoise is working when initially joined with denoise: true. But changes on the fly with new configure request aren't working:
requests abobe doesn't change anything on the fly. Is anything wrong with them? |
@lminiero any plans to merge this? |
@spscream Does it work when using the synchronous @brave44 we do plan to merge this, but not sure yet, as we're still waiting for more feedback; besides, we may have to improve how it works when doing 8000/16000 sampling rates. |
Thanks, yeah, for us it works fine with 4800 sampling rate. |
I apologize for the late reply. The audio delay was due to our "tree" scaling's forwards going through a VXLAN over VPN which encryption was not hardware accelerated.
@lminiero the synchronous requests work fine we've been using them in production also.
@atoppi Will do. Thanks ! |
@atoppi thanks, we will check it |
Can't help on forks, sorry. If you can find if any commit we made is causing the problem, please do let us know. |
@lminiero sure, we will do our best. |
@lminiero we checked on master + our commit today and we have no troubles on it. We will check the latest changes on current branch afternoon. |
@lminiero moved to latest changes from this branch, no problems now for us. |
…or resamplers. Always return the denoise flag in the admin response. Add missing docs.
We have some troubles with audiobridge. If publisher have bad network connection we hear crack sounds and it appears on every call with such clients. After change ec52fbb is a bit better, but doesn't remove issue completly. How can we debug it and help to address it? btw on vr publisher from the same users we don't have these cracks |
I tested it using clamsy utility https://jagt.github.io/clumsy/ - cracks starts after adding drops over 5% |
Unless these reports are related to the RNNoise effort, this is the wrong place to talk about it. |
I created #3297 issue to adress it. |
We are currently testing your PR. As part of our testing process, we're setting up an audio bridge room with the RNNoise feature activated, by setting the "denoise" option to true. In certain circumstances, we've noticed an echo effect. This typically happens when using the laptop's built-in speaker and microphone. Interestingly, this issue does not occur when headphones are utilized. |
@mail2mhossain that sounds more like an issue with the client environment.
Do your clients have echo cancellation mechanisms?
|
More importantly, is this indeed a problem with the RNNoise integration? Meaning, does it happen when RRNoise is enabled in the AudioBridge, but doesn't when the AudioBridge doesn't? If not, it's irrelevant to this PR, and as Alessandro said a client side problem (just wear a headset). |
It appears that the issue we're encountering, specifically the echo experienced when using the laptop's built-in speaker and microphone, may indeed be related to the absence of RNNoise integration. We are considering integrating RNNoise as a potential solution to address this challenge. The issue occurs both when RRNoise is activated in the AudioBridge and also when it's not integrated with the AudioBridge. Our application utilizes a desktop client that incorporates the Sipsorcery WebRTC library, and for audio functionalities, we're leveraging the SIPSorceryMedia.SDL2 library. Based on your feedback, it appears that echo cancellation features are expected to be handled by the client library, in this case, SIPSorceryMedia.SDL2. Consequently, I'm understanding that RNNoise might not support this specific type of noise cancellation, correct? |
Could RNNoise effectively resolve the echo issue we're facing with the laptop's built-in speaker and microphone? Or, would it be more advisable to incorporate a noise reduction library on the client side? |
Noise suppression and echo suppression are very different processes. Noise suppression subtracts from a generic stream what is being recognized as background noise and could be performed in both media server (audiobridge with RNNoise in this case) or in the endpoint. |
Merging. |
This PR is an attempt to revive an effort initially contributed by @mirkobrankovic in #2260, by adding an optional and configurable support for RNNoise to the AudioBridge plugin: the idea is to basically add a mechanism to perform noise reduction in audio rooms with the help of the RNNoise library.
This patch is is quite different from Mirko's original contribution, though:
Apart from that, I added ways to selectively enable/disable the feature. First of all, you can configure a room to enable denoising by default, by setting the
denoise
property totrue
: in that case, any participant that joins the room resunts in a denoiser instance created for them, unless they explicitly provide adenoise: false
property when joining. A participant that joins a room where denoising is not enabled by default, can instead create a denoiser by joining with adenoise: true
property. Denoising can also be enabled/disabled dynamically: participants can do that for themselves via aconfigure
request, while room owners can use the synchronousdenoise_enable
anddenoise_disable
requests instead, by specifying the room and the specific participant to impact.Coming to the actual implementation, it only partly works, due to some constraints in the RNNoise library that I haven't overcome entirely. Specifically, the
rnnoise_process_frame
function expects a buffer of exactly 480 samples to denoise: it can't be more and it can't be less. Depending on the sampling rate in use in the room, a single audio packet of 20ms received via WebRTC can contain a different number of samples: when using 16000 as a sampling rate, for instance, it will be 320 samples; 480 for 24000; 960 for 48000. This means that, at the moment, denoising works fine when you use a sampling rate of 24000 or 48000 (since can do one or two rounds of denoising with samples that are multiples of 480), while you get audio artifacts when using lower sampling rates instead. At the moment, I'm also getting artifacts when using stereo rooms (spatial audio): I do know that Opus uses interleaving when stereo is used, but even taking that into account (and so performing multiple rounds of denoising on different subsets of 480 samples) the artifacts are still there.For this reasons, I decided to submit this as a draft pull request: in fact, it kinda works already, but it also definitely needs improvements, so I'm hoping that some fresh pairs of eyes looking at the code and/or people testing and providing feedback may help address what's currently not 100% working instead.
Tagging as
multistream
since I implemented this in1.x
, but this will be trivial to port to0.x
as well should this be merged eventually.